The boundary for quantum advantage in Gaussian boson sampling

Identifying the boundary beyond which quantum machines provide a computational advantage over their classical counterparts is a crucial step in charting their usefulness. Gaussian boson sampling (GBS), in which photons are measured from a highly entangled Gaussian state, is a leading approach in pursuing quantum advantage. State-of-the-art GBS experiments that run in minutes would require 600 million years to simulate using the best preexisting classical algorithms. Here, we present faster classical GBS simulation methods, including speed and accuracy improvements to the calculation of loop hafnians. We test these on a ∼100,000-core supercomputer to emulate GBS experiments with up to 100 modes and up to 92 photons. This reduces the simulation time for state-of-the-art GBS experiments to several months, a nine–orders of magnitude improvement over previous estimates. Last, we introduce a distribution that is efficient to sample from classically and that passes a variety of GBS validation methods.


LOOP HAFNIAN ALGORITHMS
The loop hafnian function of an N × N symmetric matrix A is defined as where SPM is the set of single-pair matchings, the ways in which the indices [N ] can be grouped into sets of sizes 1 and 2, and A ij are the elements of A. This is a generalisation of the set of perfect matchings (all of the groupings into pairs) which defines the a hafnian, but with additional 'loops' referring to sets of size 1. These loops have weightings given by the diagonal elements of the matrix.

A. Eigenvalue-trace
The eigenvalue-trace algorithm for the loop hafnian (with N even 1 ) can be written as [20]: where z i ∈ {0, 1} and ⃗ z describes all possible N/2 length binary strings. The function f (A, ⃗ z) is defined as the λ N/2 coefficient of the polynomial: 1 For odd N , one can either pad A with an additional row/column [43], or modify the function f to account for a single unpaired edge. In our code [42], we opt to modify f .
where ⃗ v is a vector given by the diagonal elements of A, and X ⃗ z is defined like X, introduced in the main text, but with the 1s on the off-diagonal blocks replaced by ⃗ z: There are rows and columns of AX ⃗ z which are all 0s. These rows can be removed to speed up the evaluation of f . The eigenvalue-trace algorithm can be thought of as performing inclusion/exclusion over the set of pairs in one fixed perfect matching, defined by X. A derivation of this formula can be found in ref. [20], however note that equation 25 and 26 of ref. [20] differs as we apply the matrix multiplication by X ⃗ z inside the function f , whereas ref. [20] do this before passing the matrix to this function. This change in notation allows us to generalise this function in the next section.
The complexity of evaluating f is dominated by finding the traces of matrix powers, Tr(AX ⃗ z ) k , which can be reduced to finding the eigenvalues of (AX ⃗ z ) in O(N 3 ) time. Once we find the eigenvalues of (AX ⃗ z ), given by {e i }, we can compute trace powers as: Tr((AX ⃗ z ) k ) = i e k i . There are 2 N/2 terms in the summation in equation S2, so this results in O(N 3 2 N/2 ) complexity.

B. Repeated pairs
This algorithm makes use of a fixed perfect matching given by the adjacency matrix X, defining pairs (i, N/2 + i) for i ∈ [1, N/2). The summation in equation S2 corresponds to inclusion/exclusion of these pairs. If we consider the way that the A ⃗ n matrix is formed when evaluating the ⃗ n probability from a mixed state (described in the "multiple photons in the same mode" section of ref. [4]), X will pair the ith index in A with the (i + M )th index, and this pairing will be repeated n i times. Instead of summing over all z i ∈ {0, 1}, where each z i represents a different photon, we can instead let z i ∈ {0, 1, . . . , n i }, corresponding to including z i copies of the ith pair: We label this function lhafmix because it does not apply to general matrices, only those with the particular form of A ⃗ n . f ′ (A, γ, ⃗ z) is defined as the λ N/2 coefficient in the polynomial: The polynomials p ′ and p are almost identical, but in order to correctly construct the matrix associated with multiple photons arriving in the same mode, we must pass through γ explicitly, instead of inferring it from the diagonal elements of A. Details can be found in the "multiple bosons in the same mode" section of ref. [5]. Equation S5 makes use of the fact that increasing the weight of a pairing in X has the same effect as including a pair multiple times. This algorithm calculates mixed state probabilities in time O N 3 i (n i + 1) . Noting that this corresponds to a 2N × 2N loop hafnian, this compares well to using the repeated moment algorithm [19], which would take O N i (n i + 1) 2 , and improves on eigenvalue-trace whenever there are elements of ⃗ n greater than 1.
For general matrices such as those in pure state calculations, even if there are photon collisions which lead to repeated rows/columns in the B matrix, these do not necessarily lead to repeated pairings. However if identical pairs do occur, with the ith pair occurring η i times, we can make use of the above formula to obtain some speed-up, reducing the number of inclusion/exclusion terms to i (η i + 1). This quantity is upper-bounded by 2 N/2 , which occurs if no pairs are repeated. For even total photon number, it is lower-bounded by j n j + 1. To see this, consider that for a total of H unique pairings we can write: with n (i,k) j the number of photons from mode j which are associated with the ith pair and position k = L, R within that pair. The equality follows from the fact that only one mode will be associated with a particular (i, k), i.e. for a given (i, k) there is only one j for which n (i,k) j is non-zero. The factor associated with a given mode j is lower-bounded: which occurs when n (i,k) j is non-zero for only one choice of (i, k). Hence the overall number of inclusion/exclusion terms is lower-bounded by (S9)

C. Matching algorithm
Since the fixed perfect matching in the eigenvalue-trace algorithm is arbitrary, we can choose it so as to create identical pairs and reduce the number of steps. Equivalently, we can permute rows/columns in the input matrix so as to change the pairings created using the X matrix. Here, we give a greedy algorithm which chooses the pairings in the fixed perfect matching so as to minimise the number of inclusion/exclusion steps.
The algorithm takes as input the vector of photon numbers ⃗ n. We start by creating a list of mode labels ⃗ m = (0, 1, . . . , M − 1). This algorithm returns a list of mode pairings, ⃗ P , and how many times each pairing is repeated, ⃗ η. The algorithm then proceeds as follows: 1. Sort the modes in decreasing order of photon number, i.e. sort ⃗ n and ⃗ m according to ⃗ n 2. If n 1 ≥ 2n 2 : append ⌊n 1 /2⌋ to ⃗ η and (m 1 , m 1 ) to P . Subtract 2⌊n 1 /2⌋ from n 1 Else: append n 2 to ⃗ η and (m 1 , m 2 ) to ⃗ P . Subtract n 2 from both n 1 and n 2 .
3. Remove elements n i and m i from ⃗ n, ⃗ m for all n i = 0 4. If ⃗ n > 1: return to step 1 Else: return ⃗ P , ⃗ η and, for odd total photon number, the remaining element of ⃗ m Here we use ⌊·⌋ to notate rounding down to the nearest integer. An implementation of this algorithm can be found in the function matched_reps of our repository [42]. In Fig. S1, we show that this algorithm manages to achieve comparable scaling to the lower bound in equation S9, and all examples investigated are within a factor of 3 of this lower bound. Following the repeated pairs algorithm above, the loop hafnian can be calculated in time O N 3 (η i + 1) . This improves on eigenvalue-trace for a general matrix whenever ⃗ η contains an entry > 1, and this is true whenever there are at least two elements > 1 in ⃗ n, or if there is at least one element ≥ 4.

D. Finite difference sieve
By analogy to the Glynn formula for the permanent [22][23][24], we can find an alternative expression for the loop hafnian which uses a finite difference sieve instead of an inclusion/exclusion formula: where ⃗ δ describes all possible N/2 length vectors with δ i ∈ {−1, 1}. Here X ⃗ δ is defined as: Since an overall sign change to ⃗ δ leaves the terms inside the summation unchanged, one element of ⃗ δ can be fixed, e.g. δ 1 = 1, and the result multiplied by 2, halving the run-time. One way to think of the permanent or the loop  |. We plot the error for 10 random instances for each N using 64 nodes of the Isambard HPC system. hafnian is that it represents the coefficient of some polynomial. Calculating this coefficient can be done by evaluating derivatives of this polynomial at the origin. In this picture, equation S2 corresponds to a forward difference derivative, and equation S10 corresponds to a central difference derivative [23] We can make use of repeated pairings in the finite difference sieve algorithm in the same way as above. Then, for each pairing, δ runs from −n pair to +n pair in steps of 2, with the different terms corresponding to how many copies of the pair are associated with a −1 or +1.
We find that this method offers significant accuracy improvements over inclusion/exclusion, as shown in Fig. S2. In both algorithms, the absolute value of the machine precision errors accumulate at a fairly similar rate inside the sum, however in the finite difference sieve, the prefactor in equation S10 divides the value of the error by the number of terms in the sum.

E. Batching probability calculations
For each step of the chain-rule algorithm, we require probabilities where all but one mode has a fixed outcome, ⃗ n fixed , while one 'batched' mode takes all values from 0 to the cutoff, n cut .
In the pair matching algorithm, we only input ⃗ n fixed . If we consider the calculation when the batched mode is equal to n cut , this leaves ⌊n ncut /2⌋ copies of the batched mode paired to itself. Whilst computing the probability for the n cut outcome, we will also compute all the necessary eigenvalues required for calculating any outcome ≤ n ncut . Since finding these eigenvalues is the only cubic time step within each term in the sum, we can compute all probabilities for n ≤ n ncut in the same time complexity as calculating P (⃗ n fixed , n ncut ). For batching across sub-detectors within the same mode in threshold detector sampling, each detector is treated independently and so has a different β i , but this does not change the eigenvalue calculation, so these calculations can be batched in a similar way. We have implemented these methods in [42].
These methods could also be applied to speed up calculations of heralded non-Gaussian states in the Fock basis, as is described in ref. [37].

F. Implementation details
Our loop hafnian code is written in Python and uses Numba, a just-in-time compiler which automatically generates highly efficient code [44]. To run efficiently on distributed systems, we use MPI for Python [45]. The eigenvalue-trace algorithm is readily parallelisable as each term in the sum can be computed independently of all other terms.
Whilst testing and benchmarking our code, we ran on all major operating systems, and on x86-64 and arm64 architectures. Both Fugaku and the Isambard system (used for data in Fig. S2) use arm64 chip architectures, giving us further confidence in our run-time predictions.

A. Boson Sampling
During the early development of boson sampling, it was sometimes assumed that to simulate Boson sampling experiments, the full probability distribution must be computed [46]. It was later shown that approximate sampling methods, such as Metropolis Independence Sampling (MIS) and rejection sampling could be used to simulate boson sampling, without a 'brute-force' computation of the probability distribution [10]. Soon after, a probability chain-rule method was discovered which allowed for exact sampling [41]. This algorithm was subsequently updated to include speedups due to collisions [47] by using known results on computing permanents with repeated rows [16][17][18].

B. Gaussian Boson Sampling
The first reported simulation algorithm for GBS with number resolving detectors which improved upon the bruteforce approach was a chain-rule based algorithm which calculates mixed-state measurement probabilities by Quesada et al. [11]. Two alternate chain-rule approaches were reported shortly after by Wu et al. [12]. A quadratic speedup to ref. [11] was recently reported [9] which proposes a method where only pure-state probabilities are required.
Although not reported in the literature, an implementation of ref. [11] in The Walrus libary [43] uses the repeated moment hafnian algorithm [19], when collisions sufficiently dominate for this approach to be faster than eigenvaluetrace (without exploiting collisions).
For Gaussian Boson Sampling with threshold detectors, one can simulate number resolving detectors then record any outcomes with n ≥ 1 as a click. However, algorithms tailored to threshold detector GBS have also been reported. An exponential time, exponential space algorithm was used to simulate up to 20 click experiments [13]. However, a chain-rule method using the Torontonian has been proposed [11]. The time complexity of these algorithms is shown in table S1.
To visualise how these different algorithms establish a boundary for quantum advantage, we approximately estimate the required size of experiment for the runtime on Fugaku, the world's fastest supercomputer, to reach 1 day. This is shown in Fig. 1 F. However, we wish to emphasise that the choice of 1 day per sample is arbitrary and also that an experiment placing within the region of quantum advantage in this figure would not make for sufficient evidence to claim quantum advantage alone. It is also required to show evidence that the samples could not have been generated by faster, approximate algorithms. Without such evidence, errors from an experiment could lead to errors in the probability distribution which remove the computational complexity.
The boundary is chosen as the estimated experimental scale for the simulation time on Fugaku, the world's fastest supercomputer, to calculate a single probability. To allow comparison between different algorithms, we consider the sample to calculate the probability of an event where Nc detectors each measure n photons. Black dashed lines show this boundary when using threshold detectors, using the Torontonian function [7] or the algorithm presented in this work. The repeated moment (red) [19,43], eigenvalue-trace (blue) [20] and our results (green) for loop hafnians are compared to show the boundary for PNRDs.

MIS GBS ALGORITHMS
MIS is a Markov Chain Monte Carlo method of sampling which works by suggesting a state from a proposal distribution and accepting it according to the acceptance probability, equation 3. Otherwise, the previous state is added to the Markov chain.

A. Independent Pairs and Singles GBS distribution
Choosing a suitable proposal distribution is an extremely important factor for MIS to be useful. If the proposal distribution does not match closely to the target distribution, this will result in low acceptance probabilities, and hence a very long thinning interval. Here, we introduce an Independent Pairs and Singles (IPS) distribution, where as the name suggests we generate multi-photon samples from many independent single-photon and pair-photon generation processes, without quantum interference between separately generated singles/pairs. We find this is a better approximation to GBS than other efficiently simulable alternatives such as thermal states or distinguishable squeezed states.
Beginning from a pure Gaussian state which we wish to approximate, we first sample the number of individual photons created in each mode by the displacement, using Poisson distributions with the mean of the jth mode given by |α j | 2 . We then sample the number of photon pairs created by squeezing between all mode pairs (j, k) (with j ≤ k) from a Poisson distribution with mean given by |B| 2 j,k . Combining all outcomes results in a photon number pattern, ⃗ n.
For MIS, we must calculate the probability of our generated proposal sample, ⃗ n. There are many possible ways to create the same sample, corresponding to different groupings of the photons into pairs and singles. The total probability is related to a loop hafnian, which contains a corresponding sum over all single-pair matchings. We can write this probability as: where C ⃗ n is the matrix formed by taking |B ⃗ n | 2 and replacing the diagonal elements with |⃗ α ⃗ n | 2 .
The loop hafnian of a positive matrix is likely to be efficient to compute approximately [43,48]. However, in MIS we must also compute a loop hafnian of a complex matrix to evaluate the target probability of the sample. Hence for convenience and simplicity we make use of the same optimised and parallelised code to compute both loop hafnians, without losing any accuracy. This increases the run-time by at most a factor of 2.

B. PNRD GBS
We first consider the case of sampling in the photon number basis, ⃗ n. We expand the sample space to include a displacement variable, ⃗ α, so that only pure-state probabilities need to be evaluated. The target distribution P (⃗ n, ⃗ α) can be written as P (⃗ α)P (⃗ n|⃗ α) where P (⃗ α) is a multivariate normal distribution and hence efficient to sample from, while P (⃗ n|⃗ α) is given by equation 7 and depends on an N × N loop hafnian. We choose Q(⃗ α) = P (⃗ α), which results in the acceptance probability: so the acceptance probability does not depend on the probability density of ⃗ α. In some cases it may be useful to fix the total photon number N when sampling; for example verification methods often focus on samples of a particular N . In MIS it is possible to fix N by post-selecting our proposed states -this does not add appreciably to the run-time, since generating proposed states can be done efficiently and the computational effort is dominated by calculating p accept . In this case, the acceptance probability is where in the second line we used the definition of conditional probability P (⃗ n i , ⃗ α i , N ) = P (⃗ n i , ⃗ α i |N )P (N ), and the P (N )'s cancel and so do the Q(N )'s. In the third line, we know that if we are post-selecting, all ⃗ n will automatically satisfy N so it is a redundant variable. Hence an identical p accept can be used when fixing N .
We outline the algorithm below. To sample from a state with vector of means R and covariance matrix V : (c) Calculate the acceptance probability p accept .
(d) Add (⃗ n i , ⃗ α ′ i ) to the chain with probability p accept , otherwise add the previous state again. 6. Keep only the ⃗ n values in the chain (ignore ⃗ α ′ ). Discard the first τ burn samples and then keep every 1 in τ thin samples.

C. Threshold detector GBS
For MIS with threshold detectors, we also need to include the fan out of each mode into sub-detectors, where we only register the position of the 'first' photon. So we now expand the sample space to include a variable describing this position, x. We take the limit of a large number of sub-detectors where x becomes a continuous variable, and choose larger x to correspond to 'earlier' detections.
The POVM element for a click outcome where the first photon is at position x can be written: with p(x|n) = nx n−1 . π c (x) is closely related to the POVM element for measuring a single photon after a loss of x: Hence we can express the probability of a click pattern ⃗ c with an accompanying ⃗ x in terms of the probability of obtaining the same pattern of single photons, but from a covariance matrix V (⃗ x) where the loss x m has been applied to the mth mode (for unoccupied modes x m = 0): where as before ⃗ α ′ is a complex displacement vector chosen from a multivariate normal distribution. Since V (⃗ x) is a mixed state, we expand it as an ensemble of pure states with differing displacement vectors ⃗ α ′′ : where P (⃗ α ′′ |⃗ x, ⃗ α ′ ) is the probability distribution of ⃗ α ′′ , depending on the applied loss, ⃗ x, and the complex displacement before the loss, ⃗ α ′ . P n (⃗ c|⃗ x, ⃗ α ′′ ) is the photon number pattern probability of a pure state and can be calculated with an N c × N c loop hafnian, in time O(N 3 c 2 Nc/2 ), resulting in a quadratic speedup compared to a Torontonian. If we sample (⃗ c, ⃗ x, ⃗ α ′′ , ⃗ α ′ ) and then ignore the ⃗ x and ⃗ α outcomes, this is equivalent to sampling from P c (⃗ c) as desired.
To generate proposal samples, we begin by generating a displacement vector ⃗ α ′ and photon number pattern ⃗ n as in Appendix 3 B. Then a ⃗ x vector can be generated by sampling from p(x|n) for each element, and a click pattern ⃗ c taken by reducing each > 0 element of ⃗ n to a 1. The loss ⃗ x is applied to the state, resulting in an updated displacement ⃗ α ′ (x) and covariance matrix V (⃗ x), from which a Williamson decomposition can be used to sample a pure state -with a displacement ⃗ α ′′ and covariance matrix T ′ . The proposal probability, marginalised over ⃗ n, can be written where we note that the proposal distribution for (⃗ c, ⃗ x) is conditioned on ⃗ α ′ rather than ⃗ α ′′ , which is the last variable to be chosen. As with the target distribution, this probability can be rewritten in terms of a pattern of single photons after application of a loss: . (S20) The probability of detecting a pattern of single photons from IPS after loss is still given by the loop hafnian of a non-negative matrix: where we take except for diagonal elements and form C ⃗ c (⃗ x) by keeping the elements of C where c = 1. This results in an acceptance probability: .

(S24)
We outline the steps of the MIS algorithm below.
1. Use the Williamson decomposition to write V = T + W , where T is the covariance matrix of a pure state.
2. Sample the starting state from the proposal distribution (a) Sample a complex displacement vector ⃗ α ′ 1 . (b) Sample a photon pattern ⃗ n 1 from Q(⃗ n|⃗ α ′ ). Find ⃗ c 1 from ⃗ n 1 by fixing all n i > 1 as c i = 1. (c) If post-selecting on the number of clicks, repeat the above steps until ⃗ c contains the desired number of clicks. (d) Sample the loss, ⃗ x 1 , conditional on the photon number pattern ⃗ n 1 using p(x|n). (e) Apply the loss ⃗ x 1 to the displacement vector ⃗ α ′ 1 and the covariance matrix T , resulting in ⃗ α ′ 1 (⃗ x) and V (⃗ x). (f) Perform a Williamson decomposition on the mixed state to obtain a pure state covariance matrix T ′ and sample a new complex displacement vector ⃗ α ′′ 1 . 3. Start the Markov chain from the state (⃗ c 1 , ⃗ x 1 , ⃗ α ′ 1 , ⃗ α ′′ 1 ). Calculate the target probability using equation S18 and the proposal probability using equation S19.

D. Thinning interval and burn-in time scaling
To investigate the scaling of our algorithms, we fix the number of photons to the mean photon number number, rounded to the nearest integer. The tests described in this section are applicable to both PNRD and threshold GBS unless stated otherwise. However, we only implement them for the number resolving case.
It is important to be able to predict the run-time of simulations before they are performed. For MIS methods, this is challenging as thinning intervals and burn-in times depend on how close the proposal distribution is to the target distribution. To construct heuristics to allow us to make these predictions, we investigate how the thinning interval and burn-in time scale with the number of modes M . However, we wish to highlight that the requirements on accuracy and sample autocorrelation will vary depending on what is desired from the simulation. Therefore the results in this section should be viewed as a guide for how to predict the scaling, rather than as a prescriptive guide for what parameters should be used.
To predict the thinning interval, τ thin , we investigate systems of different sizes with M varied between 8 and 52 in steps of 4. For each M we choose 10 Haar random interferometers, and implement an MIS chain with 10,000 steps. In Fig. S4, we plot the estimated probability of a sample being repeated as a function of the thinning interval. From this we extract the thinning interval required to suppress the repeat probability to 0.1 for each M and perform a linear fit on this data, shown in Fig. S5.
The data for M = 36 appears anomalous. We believe this is caused by one of the chains drawing a proposal sample which has an unusually large target/proposal probability ratio. Such events can cause chains to reject a very large number of samples before accepting a new proposal sample. The large degree of autocorrelation which is created by events such as this are an intrinsic drawback of the MIS method, and so we do not discard this data.
The second parameter we need to determine is the burn-in time, τ burn . We know that the chain begins sampling from the proposal distribution and over time converges to the target distribution. It will converge continuously, getting asymptotically closer to the target distribution, but at some point it will be close enough that the change will not be noticeable from finite sample sizes. Therefore, we can analyse when our distribution appears to be stationary. We provide two tests to predict how the burn-in time scales with the number of modes.
For the first, we use a Bayesian likelihood ratio test for each burn-in time until we see no improvement. The likelihood ratio tests whether a set of samples s is more likely to have come from the target distribution or an adversary distribution. To test how close our distribution is to the target, P, or proposal, Q, we choose the proposal distribution as the adversary. We begin with the ratio If we assume equal priors p(P) = p(Q), this simplifies to Assuming that the probability distribution is either P or Q and so p(P|s) + p(Q|s) = 1, we can write Here our samples s i are described by (⃗ α i , ⃗ n i ) which we sample from the chain. So we can write p(s i |P) = p(⃗ α i , ⃗ n i |P) = p(⃗ α i |P)p(⃗ n i |⃗ α i , P). For purposes of benchmarking the efficiency, we fix the number of photons and so we have to adjust for post-selecting on N photons. We still assume equal priors, now p(P|N ) = p(Q|N ). So the likelihood ratio becomes where we use the fact that P (⃗ α i ) = Q(⃗ α i ). This only requires the calculation of pure-state probabilities and the probability of getting N photons in both the proposal and target distribution. This can be done for PNRDs with no additional cost to the sampling algorithm as we must calculate the pure-state probabilities in the formation of our chain, and calculating the probabilities of N photons is efficient. However, for threshold detectors, although we could add the x variable, we are not aware of a way to calculate the probability of N c clicks for either distribution. Therefore we do not apply this test to threshold detectors. We evaluate this probability for all burn-in times up to 100, for an increasing sample size up to 100. As we increase the sample size, the likelihood should eventually converge to either 0 (if it fails) or 1 (if it passes) -see Fig. S6. The closer the sampled distribution is to the target, the faster it converges to 1, so we can test how close our distribution is for different burn-in times by sampling and comparing the rate of the convergence to 1. To isolate the burn-in for testing, we need to start a new chain every time we sample. As a metric for comparing the rates of convergence, we find the sample size required to reach a likelihood of 0.95. As we increase the burn-in time, the sample size should decrease and approach the minimum. We find the burn-in time at which the sample size is within 5% of the estimated minimum. Each likelihood is estimated by averaging over 1000 Haar random unitaries. Despite this, the likelihood still gives quite noisy data and we further average over a range of 10 burn-in times, ie the likelihood at burn-in i is given by the average of the likelihood for burn-in times between i and i + 9 (see Fig. S7). The minimum is estimated in a similar way where we average over the last 20 burn-in times, when we can assume it has converged. Fig. S8 shows Supplementary Figure S7. Likelihood based burn-in time convergence. The estimated sample size required to give a likelihood ratio of 0.95 for burn-in times between 0 and 100 (averaging over 10 burn-in times). We wish to find the burn-in time beyond which we see no improvement in the number of samples required, as indicated by the dashed lines.
the estimated burn-in time for up to 28 modes and we extrapolate the linear fit to give an estimate of τ burn = 155 for 100 modes. For the second test, we note that the rate of accepting a proposed sample decreases towards an asymptotic minimum value as the chain converges. This minimum value would be reached only when sampling from the target distribution. We estimate the probability of accepting at each burn-in time up to 300 by running 10,000 chains and counting the number of times we accept for a Haar random unitary. As with the likelihood test, we still have noisy data and smooth out the curve by averaging across 10 burn-in times. We choose the burn-in time at which the probability of accepting is no more than 0.001 greater than the estimated minimum value. Again we estimate the minimum value from the end of our chain, averaging over the last 50 burn-in times. As long as the estimated burn-in time is significantly before the end of the chain, we can be reassured that the probability of accepting is changing slowly enough to consider the chain to have converged by the maximum burn-in time we test. See Fig. S9 for an example of how the acceptance rate varies with the chain length. We run this test for 10 Haar random unitaries and find the average burn-in time for each M up to 24, shown in Fig. S10, which extrapolating gives an estimate of τ burn = 785 for 100 modes. We note that these two tests give significantly different estimates for the burn-in times. This is likely due to two reasons. The first is that the likelihood may be less sensitive to the convergence and doesn't distinguish between two close distributions as well. The data for this test is more noisy and so that may hide small differences in the distributions. The second is that we fix it to have converged further for the acceptance rate test. It is an arbitrary choice to decide how close you require the distribution to be to the target distribution. In both tests, we are limited by how noisy our data is from finite sampling. If our acceptance rate test is less noisy, we are able to find the burn-in times for a better convergence. The important findings from our numerical analysis are that the burn-in time seems to scale approximately linearly with the number of modes and the gradient of the scaling depends on how close you want the distribution you are sampling from to be to the target distribution. Convergence to 1 for the IPS samplers indicates they are more likely to have come from the click distribution than the thermal samples.

A. CHOG ratio
As we approach a scale where exact validation of samples becomes unfeasible, we can use the Chen Heavy Output Generation (CHOG) ratio test outlined in ref. [6]. Certain output patterns from a random optical network occur more frequently due to constructive interference and it is thought to be difficult to replicate this observation with classical samplers. This adversarial test assesses the relative likelihood of two sets of samples ('trial' and 'adversarial') being drawn from a given ideal distribution. As samples are drawn, the CHOG ratio is updated: Convergence to a value of 1 indicates that the trial samples were more likely to be drawn from the ideal distribution than the adversarial samples. In the case of click detection samples, the probabilities of observing these samples from the ideal (squeezed) distribution are calculated using the Torontonian.
In the work by USTC [6], click samples are drawn from the trial GBS experiment and an adversarial thermal sampler. A value of 1 indicates that the GBS samples were more likely to be drawn from the ideal distribution than the thermal samples. For Jiȗzhāng, GBS samples corresponding to fixed total numbers of clicks of between 26 and 38 were validated against a thermal sampler.

IPS vs thermal
Our IPS sampler (used as a proposal distribution for the MIS algorithm) naturally incorporates constructive interference of pairs of photons. Here, we use it to generate trial samples and apply the CHOG test to validate against an adversarial thermal sampler. For the IPS sampler we use a covariance matrix corresponding to 13 sources of two-mode squeezed vacuum, with squeezing parameter r = 1.55 and transmission η = 0.3, injected into a Haar random 52-mode unitary interferometer. The covariance matrix for the thermal sampler is constructed using the same transmissions and unitary interferometer, but now injected with 26 thermal states, each with mean photon number n th = sinh 2 (r). We update the CHOG ratio using equation S32 and results for click numbers of between 13 and 18 are shown in Fig. S11. The convergence to 1 for the IPS sampler shows that those samples are more likely to have been drawn from the ideal distribution than the thermal samples.
Hence, we have shown that our IPS sampler -from which samples can be efficiently drawn classically -passes the CHOG test against a thermal sampler in a similar way to the experimental GBS samples from Jiȗzhāng. This challenges the usefulness of the CHOG test against a thermal adversarial sampler in validating quantum computational complexity of GBS. It also suggests that the IPS distribution should be used as an adversary model to test against in future experiments. Because the IPS distribution contains no interference between different photon pairs, it could be considered as the distribution generated by squeezers with zero spectral purity. Following this intuition, we also suggest a finite purity adversary (i.e. squeezing across ≥ 2 Schmidt modes) as another important, more challenging, model to test against. Calculating probabilities of Gaussian states in the presence of spectral impurity has been investigated in ref. [25].

MIS vs IPS
Our MIS method takes the IPS as its proposal distribution and should then converge to the target (Torontonian) distribution. Here, we use a CHOG ratio test to validate trial MIS samples against the IPS distribution as the adversary. Our discussion in the previous section showed that IPS samples are more likely to be drawn from the ideal distribution than thermal samples, and so here they should provide a more stringent test.
We use a covariance matrix corresponding to 6 sources of two-mode squeezed vacuum, with squeezing parameter r = 1.55 and transmission η = 0.3, injected into a Haar random 24-mode unitary interferometer. We draw 10 5 samples from the IPS distribution and use these in an MIS chain with burn-in of 50 and a thinning interval of 10. We then post-select for samples of different fixed total click numbers and update the CHOG ratio. Results for click numbers of between 9 and 12 are shown in Fig. S12. The convergence to 1 for the MIS samples shows that they are more likely to have been drawn from the ideal distribution than the starting IPS samples, and this is indicative of convergence of the chain.

B. Output correlations
In this section we compare statistical correlations in output click patterns obtained from IPS samples to the exact, Torontonian distribution. We use a covariance matrix corresponding to 8 sources of two-mode squeezed vacuum, with squeezing parameter r = 1.55 and transmission η = 0.3, injected into a Haar random 32-mode unitary interferometer. We draw 10 6 IPS samples over the 32 modes and convert them to click patterns. These can be used to estimate click probabilities over different output modes. We also use the Torontonian to calculate exact click probabilities. We estimate the single-mode click probabilities from IPS samples by dividing the number of clicks on a specific mode in those samples by the total. These are compared to the exact probabilities in Fig. S13a. We find that the values estimated from IPS samples agree with the ideal.
Next we consider second-order correlators that have been proposed as a benchmark for GBS [49]. For some state of light emerging from an optical network, they are defined as: where the projector Π i = I − |0⟩ i ⟨0| i corresponds to a click on mode i. We use IPS samples to estimate the probability of clicks on modes i and j and, together with the single-mode click probabilities from earlier, estimate C i,j for all pairs of output modes. A comparison to exact values is given in Fig. S13b. Although the points do not fall on a line of unit gradient, the positive correlation between the values indicates that the IPS samples are capturing underlying second-order correlations in the state of light from a GBS device. Following [30], we also investigate higher third-order correlations of our IPS samples. These are defined as: The estimated values are compared to the exact in Fig. S13c. Again a positive correlation in the plot of estimated against exact values suggests IPS samples encode some of the higher-order correlations present in the GBS statistics.